The CMU statistical machine translation system for IWSLT 2005

نویسندگان

  • Sanjika Hewavitharana
  • Bing Zhao
  • Almut Silja Hildebrand
  • Matthias Eck
  • Chiori Hori
  • Stephan Vogel
  • Alexander H. Waibel
چکیده

In this paper we describe the CMU statistical machine translation system used in the IWSLT 2005 evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. We experimented with two different phrase extraction methods; PESA on-the-fly phrase extraction and alignment free extraction method. The translation model, language model and other features were combined in a log-linear model during decoding. We present our experiments on model adaptation for new data in a different domain, as well as combining different translation hypotheses to obtain better translations. We participated in the supplied data track for manual transcriptions in the translation directions: ArabicEnglish, Chinese-English, Japanese-English and KoreanEnglish. For Chinese-English direction we also worked on ASR output of the supplied data, and with additional data in unrestricted and C-STAR tracks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The UKA/CMU statistical machine translation system for IWSLT 2006

This paper describes the UKA/CMU statistical machine translation system used in the IWSLT 2006 evaluation campaign. The system is based on phrase-to-phrase translations extracted from a bilingual corpus. We compare two different phrase alignment techniques both based on word alignment probabilities. The system was used for all language pairs and data conditions in the evaluation campaign transl...

متن کامل

The CMU-UKA statistical machine translation systems for IWSLT 2007

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japa...

متن کامل

The CMU-UKA syntax augmented machine translation system for IWSLT-06

We present the CMU-UKA Syntax Augmented Machine Translation System that was used in the IWSLT-06 evaluation campaign. We participated in the C-Star data track using only the Full BTEC corpus, for Chinese-English translation, focusing on transcript translation. We applied techniques that produce true-cased, punctuated translations from non-punctuated Chinese transcripts, generating translations ...

متن کامل

Edinburgh system description for the 2005 IWSLT speech translation evaluation

Our participation in the IWSLT 2005 speech translation task is our first effort to work on limited domain speech data. We adapted our statistical machine translation system that performed successfully in previous DARPA competitions on open domain text translations. We participated in the supplied corpora transcription track. We achieved the highest BLEU score in 2 out of 5 language pairs and ha...

متن کامل

Sehda s2MT: incorporation of syntax into statistical translation system

This paper describes Sehda’s SMT (Syntactic Statistical Machine Translation) system submitted to the Korean-English track in the evaluation campaign of the IWSLT-05 workshop. The SMT is a phrase-based statistical system trained on linguistically processed parallel data.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005